Skip to content

feat(sessions): Add EAP double-write for user sessions#5588

Open
noahsmartin wants to merge 10 commits intomasterfrom
sessions-eap-double-write
Open

feat(sessions): Add EAP double-write for user sessions#5588
noahsmartin wants to merge 10 commits intomasterfrom
sessions-eap-double-write

Conversation

@noahsmartin
Copy link
Contributor

@noahsmartin noahsmartin commented Jan 28, 2026

When the UserSessionsEap feature flag is enabled, session data is sent both through the legacy metrics pipeline and directly to the snuba-items topic as TRACE_ITEM_TYPE_USER_SESSION TraceItems.

This enables migration to the new EAP-based user sessions storage.

Cursor Bugbot found 2 potential issues for commit 1a745cd

When the `UserSessionsEap` feature flag is enabled, session data is sent
both through the legacy metrics pipeline and directly to the snuba-items
topic as TRACE_ITEM_TYPE_USER_SESSION TraceItems.

This enables migration to the new EAP-based user sessions storage.

Includes a Datadog metric `sessions.eap.produced` tagged with
`session_type` (update/aggregate) to track EAP writes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@noahsmartin noahsmartin force-pushed the sessions-eap-double-write branch from 21e22bf to 6acd029 Compare January 28, 2026 20:49
Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review in-depeth, as this we're going to need a different approach.

Session processing can stay as it is, we still need to extract sessions into metrics and aggregate them through the metrics pipeline to keep aggregation in-place.

Right now, you can check the implementation in services/store.rs where session metrics are separated from other "generic" metrics into the sessions topic. This is going to be the entry point where we can start double writing.

Long term, we'd want to have a more strictly typed version of aggregation for metrics, by re-using the generic metrics pipeline but properly typed. This will require some more work though.

Regarding rollout: What is the preferred rollout strategy, via a feature flag? I recommend instead doing a global rollout rate defined in "global config", this is usually the approach we're taking for controlling double writes to Kafka.

@noahsmartin noahsmartin force-pushed the sessions-eap-double-write branch 3 times, most recently from ed7c367 to 1a745cd Compare February 4, 2026 20:33
@noahsmartin noahsmartin marked this pull request as ready for review February 4, 2026 23:11
@noahsmartin noahsmartin requested a review from a team as a code owner February 4, 2026 23:11
@noahsmartin noahsmartin force-pushed the sessions-eap-double-write branch from 1a745cd to d56e867 Compare February 5, 2026 17:29
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Dav1dde Dav1dde force-pushed the sessions-eap-double-write branch from 2806e9e to 3dfbfcd Compare February 6, 2026 12:21
@Dav1dde
Copy link
Member

Dav1dde commented Feb 6, 2026

I made some changes:

  • Dropped the metric, it is redundant with the metric we already emit processing.event.produced
  • Simplify the conversion of metrics, we can convert each individual bucket to an eap item without having to aggregate/collect them first.
  • Explicit conversion code from Bucket to TraceItem in a separate module.
  • Simplified the conversion code, no allocations where not needed and a slight "rustification"

Big question for me is how the uuid for the trace_id and item_id are supposed to be generated. Your code used a deterministic way to derive the id, but this will result in items with the same item and trace id.
I removed that code for now and just replaced it with a TODO.

@Dav1dde Dav1dde force-pushed the sessions-eap-double-write branch from 3dfbfcd to 1ab932d Compare February 6, 2026 12:27
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@Dav1dde
Copy link
Member

Dav1dde commented Feb 6, 2026

We decided using a random id for trace and item id is what we prefer for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants